Filled Pause
Research Center

Filled Pause
Research Center

Filled Pause
Research Center

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

Investigating 'um' and 'uh' and other hesitation phenomena

June 30th, 2015

Kaken report published. And that's a wrap!

The three-year grant I received from Japan Society for the Promotion of Sciences (JSPS) as a research-in-aid grant (called Kaken-hi here in Japan) completed this past March. However, some paperwork remained: I was obliged to write a final report detailing the primary outcomes of the project. Writing the report took a bit of effort since I had to write it in Japanese. This was a project funded ultimately by the public of Japan, so by custom, recipients are obliged to write a report that is accessible to the general public. For me, that was probably easier than had I been required to write a scholarly level of Japanese, but nevertheless, it was a difficult enough job as is.

If you're interested, you can download and read the report at this link (in Japanese). But this post gives a short synopsis in English. The project consisted of two main goals. First, to organize a crosslinguistic corpus that would permit the study of the production of hesitation phenomena in speech and compare speakers' production of these phenomena on both their native languages as well as a second language. This goal resulted in the creation of the Crosslinguistic Corpus of Hesitation Phenomena (CCHP). I reported on the corpus at numerous events, most notably at INTERSPEECH 2013 in Lyon, France (described in this new post here).

Crosslinguistic Corpus of Hesitation Phenomena (CCHP) logo

The second goal was to use the corpus to evaluate the development of fluency in native Japanese speakers of English (the participants in CCHP) and, in particular, the developmental trajectory of hesitation phenomena in their L2 speech, relative to their L2 speech patterns. The key findings here were that their L1 and L2 speech showed high correlations wth respect to silent pause duration, but less so with respect to silent pause rate and speech rate. This means that silent pause duration is not a reliable feature to use when evaluating their L2 speech: It is basically a reflection of their first language speech and is not a meaningful indicator of L2 proficiency development. Silent pause rate and speech rate, on the other hand, should be more reliable. I will be reporting on this at the International Congress for Phonetic Sciences (ICPhS) at Glasgow later this summer.

Correspondence between L1 Japanese and L2 English silent pause duration in CCHP

Although these were the two main aims of the project, I also managed to accomplish a few other things, notably, the construction of a proof-of-concept Java application to help L2 speakers work on developing their speech fluency while receiving real-time feedback related to the ongoing measurement of parameters of their speech fluency. The application is called Fluidity and was described in a news post here.

I also managed to use Amazon Mechanical Turk in order to crowd-source the process of getting fluency ratings on samples of the corpus speech data. This was pretty ambitious for my first major use of MTurk, but went surprisingly well. I am still crunching the data from that, so I don't have any results to share yet. But I am cautiously hopeful that with fluency rating data, I'll be able to look more closely at the relationship between utterance fluency and perceptual fluency in L2 speech.

[Note: This post was published in September 2020 but has been dated in order to reflect the actual timing of the events described here.]